This folder contains data and code to replicate figures 2:7  in the body of the article: "Blocks as geographic discontinuities: The effect of polling place assignment on voting".

The sub folder distance_computation contains the code to run compute the distance between a registrant and a polling place, according to the steps outlined in the paper.  The replication materials do not contain the sensitive geocode data that is used in the original analysis. The code to perform the related analysis and an example analysis using publicly available data is included in distance_computation folder.

Running the provided requires the R packages tidyverse (1.3.1), fixest (0.10.1), cowplot (1.1.1), Cairo (1.5.14) and scales (1.1.1). Please see: 
https://www.tidyverse.org/ https://cran.r-project.org/web/packages/fixest/index.html
https://cran.r-project.org/web/packages/cowplot/readme/README.html
https://cran.r-project.org/web/packages/Cairo/index.html
https://github.com/r-lib/scales

The code here was tested both in Mac OS and Linux environments. All code was tested on a single core CPU with 16 GBs of RAM.

With the exception of replication_a8 none of the files take no more than a few minutes. This single file (replication_a8) takes 20-30 minutes to run. 

The R code was run with R version 4.1.2. Refer to the README in the distance_computation to see information relating to the python requirements.  

The code expects the files to be stored in a directory one level up from the working R directory. We recommend you set the working R directory to replication_code. 

To produce all plots run run_all.sh. The table a2 requires proprietary data and thus we cannot include it here. 

To produce the figures 2 and 5, refer to the script replication_main_effect.R. This script utilizes data in the files distance_main_effect_data.csv and shock_main_effect_data.csv. 

To produce figures 3 and 6, refer to the script replication_scatter_plots.R. This script also uses the distance_main_effect.csv and shock_main_effect.csv files. 

To produce figure 4 refer to the script replication figure_four, which uses both the distance_main_effect.csv file and the states_substitution_voting.csv file (this is loaded in within the file plotting.R). 

To produce figure 7 refer to the script replication_figure_seven.R, this script uses two files: shock_windows_attributes.csv and shock_windows_votes.csv.  


To produce the figures 1-6 in the appendix refer to the script replication_a1-6.R.  This script utilizes data in the files: distance_balance_plot_main.csv, distance_real_estate_balance_plot.csv,  shock_balance_plot_main.csv and shock_real_estate_balance_plot.csv. 

To produce the figure a7 in the appendix, refer to the script replication_a7.R. This script utilizes the data in the file: shock_historical_voting.csv. 

To produce the figure a8 in the appendix, refer to the script replcation_a8.R. This script utilizes the data in the file: distance_windows_votes.csv, and distance_windows_attributes.csv.

To produce the figure a9 in the appendix, refer to the script replication_a19.R. This script utilizes the data in the files: distance_main_effect_data_cutoff_p1.csv, distance_main_effect_data_cutoff_p5.csv, shock_main_effect_data_cutoff_p1.csv, and shock_main_effect_data_cutoff_p5.csv.

Figure a10a is the same as figure2. To produce the figure a10b in the appendix, refer to the script replication_a10.R. This script utilizes data in the file distance_main_effect_10_b.csv. 

The data file distance_main_effect_data.csv contains the following variables:
state: state abbreviation
block_id: a unique state, county, road, block # identifier 
household: a unique hashed household id
treatment: "treatment" if assigned to the further polling place to block; "control" if assigned to closer polling place to block
distance_to_pp_2016: average distance in household to assigned polling place
voted_2016: "p" if voted in-person; "a" if voted absentee; "e" if voted early in-person; "m" if voted by mail, NA if didn't vote

The data file shock_main_effect_data.csv contains the following variables:
state: state abbreviation
block_id: a unique hashed state, county, road, block # identifier 
household: a unique hashed household id
treatment: "treatment" if assigned to a new polling place in 2016; "control" if assigned to same polling place in 2016 as in 2012
distance_to_pp_2016: average distance in household to assigned polling place in 2016
voted_2016: "p" if voted in-person; "a" if voted absentee; "e" if voted early in-person; "m" if voted by mail, NA if didn't vote

The data file shock_windows_attributes.csv contains the following variables:
state: state abbreviation
block_id: a unique state, county, road, block # identifier 
treatment: "treatment" if assigned to a new polling place in 2016; "control" if assigned to same polling place in 2016 as in 2012
num_people: number of registered on block with treatment status
num_white: number of registered on block with treatment status who are classified as being White
num_rural: number of registered on block with treatment status who are classified as living in rural area
change_in_distance: change in the average distance on block to assigned polling place in 2016 relative to assigned polling place in 2012

The data file shock_windows_votes.csv contains the following variables:
state: state abbreviation
block_id: a unique hashed state, county, road, block # identifier 
change_in_distance: change in the average distance on block to assigned polling place in 2016 relative to assigned polling place in 2012
voted_2016: "p" if voted in-person; "a" if voted absentee; "e" if voted early in-person; "m" if voted by mail, NA if didn't vote
treatment: "treatment" if assigned to a new polling place in 2016; "control" if assigned to same polling place in 2016 as in 2012
household_id: a unique hashed household id


The data file distance_balance_plot_main.csv and shock_balance_plot_main contains the following variables: 
state: state abbreviation
block_id: a unique hashed state, county, road, block # identifier 
household: a unique hashed household id
treatment: if distance, then  "treatment" if assigned to the further polling place to block; "control" if assigned to closer polling place to block, if shock then "treatment" if assigned to a new polling place in 2016; "control" if assigned to same polling place in 2016 as in 2012
party: if included in the record the party that this person indicated in their registration. 
gender: female, male or unknown. 
race: one of African American, Asian, Caucasian, Hispanic, Native American, Other or Uncoded.
age: The age of the registrant at time of registration. 
county: The county the registrant registered in. 
total_in_assignment: The total number of people in this treatment assignment for this block id. 

The data file distance_real_estate_balance_plot.csv and shock_real_estate_balance_plot.csv contains the following variables:
state: state abbreviation
block_id: a unique hashed state, county, road, block # identifier 
household: a unique hashed household id
treatment: if distance, then  "treatment" if assigned to the further polling place to block; "control" if assigned to closer polling place to block, if shock then "treatment" if assigned to a new polling place in 2016; "control" if assigned to same polling place in 2016 as in 2012
home_value_raw: The imputed home value of the household. 


The data file shock_historical_voting.csv contains the following variables: 
state: state abbreviation
block_id: a unique hashed state, county, road, block # identifier 
household: a unique hashed household id
treatment: "treatment" if assigned to a new polling place in 2016; "control" if assigned to same polling place in 2016 as in 2012	
voted_2012: "p" if voted in-person; "a" if voted absentee; "e" if voted early in-person; "m" if voted by mail, NA if didn't vote

The data file distance_windows_votes.csv contains the following variables:
state: state abbreviation
block_id: a unique hashed state, county, road, block # identifier 
household: a unique hashed household id
change_in_distance: 	
voted_2016: "p" if voted in-person; "a" if voted absentee; "e" if voted early in-person; "m" if voted by mail, NA if didn't vote	
treatment: "treatment" if assigned to the further polling place to block; "control" if assigned to closer polling place to block
change_in_distance: The difference in distance between the treatment and control group for this block id. 		

The data file distance_windows_attributes.csv contains the following variables:
state: state abbreviation
block_id: a unique hashed state, county, road, block # identifier 
treatment: "treatment" if assigned to the further polling place to block; "control" if assigned to closer polling place to block
change_in_distance:  The difference in distance between the treatment and control group for this block id. 			
num_people: Total number of people in this block id and treatment condition. 	
num_white: Total number of caucasian people in this block id and treatment condition. 		
num_rural: Total number of people within a rural county in this block id and treatment condition. 	

The data files distance_main_effect_data_cutoff_p1.csv, distance_main_effect_data_cutoff_p5.csv, shock_main_effect_data_cutoff_p1.csv, and shock_main_effect_data_cutoff_p5.csv  and distance_main_effect_10_b.csv contain the following variables:
state: state abbreviation
block_id: a unique hashed state, county, road, block # identifier 
household: a unique hashed household id
treatment: if distance, then  "treatment" if assigned to the further polling place to block; "control" if assigned to closer polling place to block, if shock then "treatment" if assigned to a new polling place in 2016; "control" if assigned to same polling place in 2016 as in 2012
distance_to_pp_2016:
voted_2016: "p" if voted in-person; "a" if voted absentee; "e" if voted early in-person; "m" if voted by mail, NA if didn't vote






